The article discusses on-policy distillation in training language models, emphasizing the benefits of smaller, specialized models that can outperform larger generalist ones in specific domains. It contrasts on-policy training, which provides direct feedback through reinforcement learning, with off-policy training, which relies on imitating teacher models and can lead to compounding errors. The piece highlights the importance of choosing the right training approach to maximize model efficiency and accuracy.